Module 01

Reserve the first level headings (#) for the start of a new Module. This will help to organize your portfolio in an intuitive fashion.
Note: Please edit this template to your heart’s content. This is meant to be the armature upon which you build your individual portfolio. You do not need to keep this instructive text in your final portfolio, although you do need to keep module and assignment names so we can identify what is what.

Module 01 portfolio check

The first of your second level headers (##) is to be used for the portfolio content checks. The Module 01 portfolio check has been built for you directly into this template, but will also be available as a stand-alone markdown document available on the MICB425 GitHub so that you know what is required in each module section in your portfolio. The completion status and comments will be filled in by the instructors during portfolio checks when your current portfolios are pulled from GitHub.

  • Installation check
    • Completion status: done
    • Comments:
  • Portfolio repo setup
    • Completion status: done
    • Comments:
  • RMarkdown Pretty html Challenge
    • Completion status: done
    • Comments:
  • Evidence worksheet_01
    • Completion status: done
    • Comments:
  • Evidence worksheet_02
    • Completion status: done
    • Comments:
  • Evidence worksheet_03
    • Completion status: done
    • Comments:
  • Problem Set_01
    • Completion status: done
    • Comments:
  • Problem Set_02
    • Completion status: done
    • Comments:
  • Writing assessment_01
    • Completion status: done
    • Comments:
  • Additional Readings
    • Completion status: none
    • Comments

Data science Friday

The remaining second level headers (##) are for separating data science Friday, regular course, and project content. In this module, you will only need to include data science Friday and regular course content; projects will come later in the course.

Installation check

Third level headers (###) should be used for links to assignments, evidence worksheets, problem sets, and readings, as seen here.

Use this space to include your installation screenshots.

Git bash screenshot

Git bash screenshot

R studio screenshot

R studio screenshot

Git hub screenshot

Git hub screenshot

Portfolio repo setup

Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.

Once in Git Bash

cd ./Documents

cd MICB425_portfolio

git status

git add .

git commit .

shift title of edit to portfolio

:wq

git push

RMarkdown pretty html challenge

Paste your code from the in-class activity of recreating the example html.

R Markdown PDF Challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown

Here’s a Header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!

Origins and Earth Systems

Evidence worksheet 01

The template for the first Evidence Worksheet has been included here. The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).

You can copy-paste in the answers you recorded when working through the evidence worksheet into this portfolio template.

As you include Evidence worksheets and Problem sets in the future, ensure that you delineate Questions/Learning Objectives/etc. by using headers that are 4th level and greater. This will still create header markings when you render (knit) the document, but will exclude these levels from the Table of Contents. That’s a good thing. You don’t’ want to clutter the Table of Contents too much.

Whitman et al 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?

Given the abundance of prokaryotes on earth, how do we calculate the total carbon mass, nitrogen mass, phosphorus mass, number of organisms, and where are they mostly found?

  • What were the primary methodological approaches used?

The earth was divided into a series of environments by which a series of calculations were applied to estimate the total number of organisms based on average abundances within a fixed volume of area. Each environment was studied and referenced for each estimate generated. In addition, assumptions were applied to standardize the distribution of prokaryotes over a given environmental niche.

  • Summarize the main results or findings.

Prokaryotes number from 4-6 X10^30 cells, amounting to 350-550 Pg of carbon, amounting to about half of earth’s total biomass. Prokaryotes contain more nutrients than plants, consisting the largest nutrient source on earth. Prokaryotes are found mainly in the ocean, soil, and in subsurface masses in the earth’s crust.

  • Do new questions arise from the results?

Should our calculations of earth’s total biomass be revised to account non-uniformity in terrain? Could different continental areas contain significantly different densities of microbes? Are there yet still environments where prokaryotes exist that we have yet to discover?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

The results of the paper rely on many other studies who’s calculations or estimates may not be very accurate. This as we understand is the “best estimate” scenario given the current literature and technology at that time. Many assumptions were used to arrive at the final figure of the calculations, and not all the assumptions were justified. Although, the figures presented were argued to be within a certain order of magnitude of accuracy, which is telling of the precision of the calculations that were performed. The authors answered the research questions by first accounting for the largest environmental contributors to prokaryotic count, and then moved into more specific environments that did not contribute heavily to changes in the overall cell and mass counts, despite being rather large in magnitude. Figures and tables summarized the counts gathered from each environment, on which the overall calculations were based. They were in general easy to understand but left out key variations that I believe are crucial in the final result.

Problem set 01

Whitman et al 1998

Learning objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.

The primary prokaryotic habitats on earth are the oceans (referring specifically to bodies of water), mountains and the subterranian layers of land, forests (majority on leaves), underwater sediment layers.

  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?

The number of prokaryotic cells in the upper 200m of ocean are 10^18 cells, with prochlorococcus making up the majority of prokaryotic life in this layer. There is not as dense a mass of cells below the first 200m but the volume of this layer is larger than that of the top 200m of ocean Upper 200m of the ocean: 360x10^26 Fraction represented by cyanobacterium including Prochlorococcus: 8% Marine cyanobacterium such as Prochlorococcus produce their own energy from sunlight via photosynthesis, which in the process produces oxygen while fixing carbon. Despite only being 8% of the prokaryotic cell abundance in the upper 200m, they are responsible for approximately 50% of the oxygen in the atmosphere and contribute greatly to carbon cycling as demonstrated by their quick turnover time and resulting 8.2 x 10^29 cells/year.

3.6 X 10^28 cells 5 X 10^5 cells/mL Cyanobacteria 4 X 10^4 cells/mL/ 5 X 10^5 cells x 100 = 8%

  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?

Autotrophs in this text are bacteria that produce their own food, primarily using energy from the sun. As a result, these are prokaryotes that are often found on surface environments that are able to recieve some amount of sunlight. They are <10% of upper layer marine prokaryotes. However, they form the majority of prokaryotes in soil and subsurface. Thus, they are defined as primarily land-dwelling organisms. Heterotrophs make up the majority of prokaryotic organisms with the majority of those found below 200m. They are defined as the most abundant sea-dwelling organisms. Lithotrophs are subsurface prokaryotes that use a different method of energy generation. They are defined as mysterious, primarily found in subsurface environments, and are scarcer than other types of prokaryotes.

Autotroph- “self nourishing”, fix inorganic carbon into biomass Heterotroph - Assimilate organic carbon Lithotroph - use inorganic substrates

  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

The Mariana Trench is the deepest part of the ocean, and we know that it is an environment that supports prokaryotic life, although at this depth, there is nearly no light reaching it as well. Therefore, it is the deepest habitat known to support life. Because the paper has deduced that subsurface sediments below the water layer also contains prokaryotes, we could make the argument that the deepest habitat to host prokaryotic life would be the subsurface sediment layer of the Trench. Subsurface environments on land may contain prokaryotes further below that of the Mariana Trench. However, not much is currently known about life existing below these depths, due to challenges in retrieving uncontaminated samples from these areas. The text talks about how in subsurface environements, the limited carbon nutrition available to these organisms means that the majority are metabolically inactive or non-viable. However, evidence shows that metabolic activity is on par with that of surface prokaryotes. Because most of the carbon nutrient availability is gained from the surface, the primary limiting factor would be the transfer of carbon nutrients from surface to deeper subsurface environments, which logically decreases the deeper you go.

  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?

Prokaryotes have been found in in the atmosphere at altitudes as high as 57-77 km. Mount Everest (8,848 meters) is the highest geographical location on Earth, and therefore would technically be the highest habitat capable of supporting prokaryotic life. Is it capable of supporting prokaryotic life? Primary limiting factors at this height include temperature. Some prokaryotes, psychrophiles, have adapted to such low temperatures. Nutrients are also limited at high altitude. Less atoms are found in the upper atmosphere and thus less material is available to compose the building blocks of life. This would result in slower growth. UV radiation as well as pressure are limiting to life at high altitudes because they can damage cells.

  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?

Taking the lowest point and highest point, there is 24km. The “skin” of the world. The biosphere of the earth is a relatively narrow band. Lower range: Mariana Trench is 10,994 meter deep, but the lower limit is much deeper since it includes subsurface sediments, which is about 4.5km deeper. Upper limit: Mount Everest 8,848 m high, but the upper limit is much higher if it includes atmosphere as an “habitat”. Vertical distance of the Earth’s biosphere: 19.84 km + 4.5km = 24km (+ potential atmosphere)

  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)

Annual cellular production, in cells/year X 10^29 was calculated with the following formula: Cells/year = Population Size * (365 / (turnover time [days])) Or ( same thing below) Cells/year = Population Size * (turnover/year)

Marine heterotrophs [3.6 x 10^28 cells x 365 days]/16 turnovers = 8.2 x 10^29 cells

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?

Carbon content along with carbon assimilation efficiency determine the upperbound limit on the turnover rates seen in the upper 200m of the ocean. This varies with depth in the ocean, and between terrestrial and marine habitats because the abundance of carbon in each habitat is different. Carbon efficiency = 20% (this is an assumption that the authors make) - somehow get a multiplier of 4 from this to use to multiply total carbon later; not sure why Total carbon = average carbon per cell * number of cells 4 * total carbon = 2.88 Py/year Carbon efficiency: 20% 20 fg of C on avg in prokaryotic cell (20 fg/cell) ~20 = 20?*10^-30 Pg/cell (3.6 X 10^28 cells) x (10^-30 Pg/cell) = 0.72 Pg C in marine heterotrophs 51 Pg cell/year 85% consumed = 43 Pg C (43 Pg cell/year)/2.88 Pg/year = 14.9 turnovers/year, 1 turnover every 24.1 days [365 days /14.9 turnovers = ~24 days / turnover]

  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

((365d/y)(24h/d)/(((410-7)4 mutations/cell))(8.210^29 cells/y)=(h/4 simultaneous mutations) = 4x10^-7 mutations/generation For 4 mutations to happen at once: (4x10-7)4 = 2.56x10^-26 mutations/generation (3.1x 10^28 cells) x 22.8 = 8.2 X 10^29 cells/yr 365 / 16 = 22.8 turnover/yr (8.2 x 10^ 29 cells/ yr) x 2.56 x 10^26 mutations/yr = 2.1 X 10^4 mutations/ yr

  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?

A large mutation rate means that there is a great potential for multiple point mutations in a single replication. This allows for quick adaptation by creating a more diverse pool of mutants to be selected from. Genetic diversity will be extremely high when small scale changes to sequence are considered and long term “species” level biodiversity will mostly be determined by competition and environmental pressures. Horizontal gene transfer can allow new genes to proliferate in a microbial community assuming the gene is successful in the organism is “born” in.

  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?

High abundance allows for high diversity by increasing the potential for mutations and simultaneous mutations. Metabolic potential is dependent on both abundance and diversity. Diversity determines the pool of available genes to be used in metabolic pathways and abundance determines the magnitude of the effect of these pathways.

Evidence worksheet 02

Kasting and Siefert, 2003 Nisbet and Sleep, 2001

Learning objectives:

Comment on the emergence of microbial life and the evolution of Earth systems

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

    • 4.6 billion years ago
      Moon formation gives earth its spin and tilt. High global temperature Zircon formation (oldest known) 4.4 Ga
    • 4.2 billion years ago
      late heavy bombardment Slight evidence of life in zircon 4.1 Ga through graphite.
    • 3.8 billion years ago
      Plate subduction
    • 3.75 billion years ago
      Water present only as vapour.
    • 3.5 billion years ago
      Methagen proliferation giving a very warm earth much dimmer sun back then. Methanogenesis Photosynthesis began during this period.Rubisco evidence here. Proper evidence of life at 3.5 Ga
    • 3.0 billion years ago
      First glaciation oxygen in the atmosphere decreases greenhouse effect.Life is mostly underwater at this point
    • 2.7 billion years ago
      Gene transfer was probably the primary mechanism for new adaptations to be formed De novo synthesis of new genes not very likely.
    • 2.2 billion years ago
      Life on land. First indication of eukaryotic life during this period. Second glaciation and carbon explosion.
    • 2.1 billion years ago Symbiosis manifestation Mitochondrial and chloroplast complexity increase. Rocks recognized as red beds, O2 levels increase due to eukaryotes evolving to produce O2.
    • 1.3 billion years ago Snowball earth 1 billion years ago
    • 550,000 years ago Emergence and development of complex plants 0.4Ma, fish, insects and tetrapods on land Greatest ecological diversity during this period Rapid expansion and evolution Permean extinction (95% of all life)
    • 200,000 years ago Humans arise
  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean
      Formation of the solar system
    • Archean
      Meteorite bombardment and cessation leads to sea water formation sedimentary rock formation
    • Precambrian
      Gene transfer was probably the primary mechanism for new adaptations to be formed De novo synthesis of new genes not very likely
    • Proterozoic
      Symbiosis manifestation Mitochondrial and chloroplast complexity increase
    • Phanerozoic
      Cabrian explosion: multicellular life and animal emergence 0.54Ma Continental drift and glaciation Filling of oxygen into earth’s atmosphere Mammal species emergence And global warming: CO2 rise

Problem set 02

Falkowski, et al., 2008 Zehnder, 1988

Learning objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?

The earth’s tectonic and atmospheric photochemical processes form the geochemical cycle that allows life to exist. These processes allow for molecular interaction and for chemical bond formation and breaking to allow for equilibrium to never be reached and thus substrates would continually be renewed for life to use. Life influences the earth’s climate and composition by redox reactions. Microbes catalyze these redox reactions and fundementally alter the earth’s redox state. In turn, the earth cycles back these redox reactions to create the feedback cycle in which both activities are linked.

  • Why is Earth’s redox state considered an emergent property?

Fluxes of five elements: H, C, N, O, S are controlled primarily by redox reactions that are carried out by life. These reactions initiated by microbes fundementally alter the redox state of the surface of the planet. The earth’s current redox state is an emergent property because it exists at a point of balance between the redox state created by the tectonic activity of the earth, and the redox state that would be due to microbial activity. Therefore, the earth’s redox state exists only because of the existence of life on earth.

  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?

Elemental cycles are often controlled by a series of redox reactions in tension with each other. Identical or near-identical cycles are used for both forward and reverse directions of reactions that maintain the cycles. Microbes utilize synergistic cooperation with other species in order to propogate these cycles, with one microbe using one direction of a cycle for energy production, while the other microbial species uses the opposite direction for bioassimilation, which in the process expends energy. These activities are able to thus able to overcome barriers to reversible electron flow by sacrificing efficiency of energy transfer for continuation of the cycle. Another contributive source of energy used to overcome many of the energetic barriers to electron flow is the use of light energy for photooxidative processes.

  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?

Nitrogen fixation allows for the conversion of N2 gas into NH4+, which is a reductive process. The highly evolutionarily conserved nitrogenase enzyme allows for this step to occur, and is inhibited by oxygen. This step occurs in anoxic environments. Oxidation of NH4+ to NO2- occurs only in the presence of oxygen, and thus an oxygenated environment. Different bacteria then further oxidize nitrogen to NO3- in an . NO2 and NO3 is also used as a source of oxidation in the abscence of oxygen, returning it to N2. This process thus occurs in an anoxic environment. The emergence of the nitrogen cycle as giving rise to the most prominent gas currently existing in the atmosphere would have lowered CH4 levels, causing a decrease in global temperatures, along with the rise of oxygen levels.

  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?

Metabolic diversity to an extent drives microbial diversity. This is because oxidative and reductive metabolic processes often exist in different organisms. As such, one organism exists as utilizing either the reductive or oxidative portion of an elecmental cycle while another uses the other half. It is known that metabolic proteins or even whole metabolic pathways can be transferred horizontally from one microbe to another. The extent of this is controlled by nutrient and bioenergenic selective pressures (whether or not such evolution would result in an greater ability to utilize or obatain energy). The discovery of new protein families in microbial community genomes indicates that we have only begun to scratch the surface regarding the evolutionary diversity in nature arising as a result of these selective pressures. This discovery process is roughly linear with the number of new genomes sequenced. As it currently stands, there is a potentially unlimited quantity of genetic diversity in microbes. However, their distribution would be limited by the environments they are found in, with the caveat that a large portion of the relevant cellular machinery for all different kinds of metabolism are harbored in microbes not necessarily actively using them for energy production.

  • On what basis do the authors consider microbes the guardians of metabolism?

Microbes are the guardians of metabolism on the basis that they are responsible for maintaining the core planetary gene set, which are all the genes encoding the metabolic machinery to take advantage of every single metabolic environmental niche on the planet. They do this because viable bacteria of any particular functional type can re-grow from almost any environmental niche, even if that environment cannot initially support its growth. This is attributed to the relative slow decay of microbial biomass relative to its propagation through dormancy or through sporulation.

Evidence Worksheet 03

Martinez et al., 2007

Learning objectives:

. Evaluate human impacts on the ecology and biogeochemistry of Earth systems.

General Questions

. What were the main questions being asked?

What are proteorhodopsins (PRs)? How much PR is expressed in the ocean’s photic zone? Which biochemical and genetic pathways are PRs involved in? How are PRs transferred between organisms and what is the minimal requirement for transfer? What are the minimal heterologous genetic-level transfers required for the transfer of a phenotype?

. What were the primary methodological approaches used?

Fosmid Library screening (reference library scans) PR expression screening (high density colony macroarrays) In vitro Transposition and Full Fosmid Sequencing (fosmid cloning, in vitro transposition using EZ-Tn5 insertion kit) (Electroporation and antibiotic selection, sequencing using ABI prism 3700 and BigDye version 3.1 cycle sequencing kit) Carotenoid extration (cell harvest, sonication, filtration, evaporation) HPLC (chromatographic seperation and analysis) ATP measurement (luciferase based assay in 96 well plates) Proton pump experiments (RPC-100 photosynthetic chamber and pH measure)

. Summarize the main results or findings.

Colonies were found to produce orange pigment in the absence of exogenous retinal. The PR photosystem was identified and localized within the fosmids that were cloned. They took DNA from the enivronment, put it into E.coli (heterologous) and programmed it to produce a phenotype not normally in E.coli. They found a way to mine the uncultivated diversity of microbes in the environment. And they found the minimal requirements for the transfer of a phenotype over to another organism. This was found to not be many (around 7). Thus, transfer was found to be very widespread.

. Do new questions arise from the results? If we were to extrapolate, what would we see the end of the antropocene as in the future? Is the definition yet to be formalized and solidified? What activities would be characteristic of non-antropocene times? How does this link into our understanding of where climate change is headed? Will be able to reverse the effects of end-of-antropocene changes?

. Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

The authors used the narrow scope of Proteorhodopsins to address the question of the origins of the anthropocene. We felt that this is a rather narrow angle with reference to the number of changes taking place duing this period for a robust understanding of how the antropocene came about. Methods focused primarily on biochemical analyses for determination of the prevalance of these microbes containing proteorhodopsin. These methods are not easily interpreted by persons without a background in biochemistry. Furthermore, these results are extrapolative of their meaning towards distribution of proteorhodopsin in the ocean

Writing Assignment 01

Ian Lee, 44968139

Homo sapiens. Beings who by use of technology have elevated themselves past the limits of evolutionary constraint, becoming dictators of their own fate, along with the fate of the Earth. Their success has led to a population explosion now numbering about 7 billion individuals. However, in doing so, they have begun to resemble a cancer that is consuming and destroying Earth’s regulatory systems. The Earth thus currently sits at a crucial turning point that will determine the fate of its future. Within itself, rests two key players in its transformation: one whose metabolic engines have been driving its change for eons, and one who’s recent emergence as an environmental force has been so large that it surpasses much of the natural order before it. Microbes, with steady evolution molding Earth’s elemental equilibrium to the point where abundant oxygen for aerobic respiration, and elemental sequestration that constitutes the building blocks of organic matter, have been the custodians of this planet for billions of years, are now being overthrown by the species claiming dominion over all. In this essay, I will present a defense of the notion that microbial life existed easily without human intervention from the dawn of the Archean (Nisbet & Sleep, 2001), whilst promoting the notion that humans rely on microbes for key processes that are essential for sustaining life, and as we continue to exert large change on our environment, that we risk disrupting the delicate balance created by microbes that enables the survival of the human species. This disruption of complex elemental cycling is ultimately insurmountable even by the vast capacity of human innovation, thus dooming us to extinction should we carelessly venture past the boundaries of ecological provision.The reasons for why this is true branch into three main chains of thought that I will elaborate upon through the course of this essay: 1) Microbial engines that have existed without the need for human intervention drive the world at a magnitude and complexity impossible for humans to replicate. 2) Even if we could in the future replicate these processes, the result of such intervention would more likely than not be extremely harmful towards microbes and ourselves. 3) There is a fundamental human weakness at addressing problems of this nature, which lies in the social and psychological impetus for change at a global scale.

The very existence of microbial metabolism throughout evolutionary time also presents evidence towards the independence of microbes from human intervention (Nisbet & Sleep, 2001). The nitrogen and oxygen cycles represent key drivers for life on earth and are especially relevant towards the survival of the human species (Falkowski, Fenchel & Delong, 2008). The oxygen cycle links in with the nitrogen cycle and is obvious in its utility of allowing for aerobic respiration, of which humans obligately rely on. Nitrogen cycling however, is of large relevance towards the question of how we feed ourselves (Canfield, Glazer & Falkowski, 2010). Since the start of the industrial age we have relied on the Haber process to sequester N2 into NH4. The magnitude of our reliance is such that global Haber-Bosch processes contribute 120 million tons of NH4 sequestration annually, constituting 40% total nitrogen sequestration on earth (RockstrÖm et al., 2009, Canfield, Glazer & Falkowski, 2010). In contrast, microbes are responsible for the remaining 60%, as well as for all the counter reactions of denitrification that equal the magnitude of nitrogen fixation globally. We know little about how long it took for localized metabolic activity to begin altering the Earth on a global scale. In addition, we also know little about how the co-evolution of metabolic pathways gave rise to complete elemental cycles resembling the ones essential to sustain life. These processes likely came about through complex horizontal gene transfer mechanisms between species which humans have not yet been able to replicate (Falkowski, Fenchel & Delong, 2008). This gives us pause when we would suggest that the feasibility of synthesizing a technologically driven nitrogen cycle. For one, consider the magnitude of production that has gone into providing the fertile ground for human crop cultivation. Microbes have already been shown to contribute the majority of reduction from, N2 to NH4 and all denitrification/annamox reactions balancing them. Replacement of just this step of the cycle by humans would require an engineering solution equal to the magnitude of our global efforts at fertilization, whilst increasing global nitrogen fixation through the Haber process by 2.5 times. Therefore, it is reasonably difficult that humans would not survive. We do not currently have an industrial process mirroring denitrification for our benefit. Therefore, we must keep in mind that any efforts to do so would be solely for ensuring our continued survival through maintenance of the nitrogen cycle. In doing so, we can expect significant strain on our capacity to feed ourselves without microbes. To justify the capacity of microbes not requiring human intervention, one need only consider the scenario of humanity disappearing. The realistic expectation of what follows would be a return of equilibrium to a steady state after a bloom of denitrifying bacteria and archaea (Falkowski, Fenchel & Delong, 2008).

If we took the opposing arguments target the possibility of synthetic life or large-scale geoengineering to counter the loss of microbes at their best, we would consider that our grasp of technology to be superior to our circumstances. Solutions would then come in the way of genetically modified organism, and energy production could be handled through solar powered plants (Achenbach, J. 2012). What has not been considered here is the fact that humanity fails in its application of technology in two major ways: first in ecological/environmental damage that could jeopardize our survival as a species, and the second is through the creation of anthropologically driven climate change (Achenbach, J. 2012). Our technology is not as robust as we often think it is. Examples such as the Fukushima Nuclear Plant incident, along with the BP oil spills show how even with a host of systems in place to guard against catastrophic failure, that these events still occur (Griffin D., 2015, World Nuclear Association, 2018). Humans are also responsible for creating imbalances within the Earth’s geochemical cycles. Although humans were responsible for developing the Haber process in the 20th century, the process was only 40% efficient at delivering reduced nitrogen to crops (Canfield, D., Glazer, A., & Falkowski, P. 2010). This had 2 effects, the first of which is that fertilization of crops became an expensive endeavor, and the second was that large amounts of NH4 made its way into waterways, inducing large zones of eutrophication and hypoxia. Continued use of nitrogen-based fertilizers in this manner will lead to a new balance in the nitrogen cycle, permanently altering the viability of marine life (Canfield, D., Glazer, A., & Falkowski, P. 2010, RockstrÖm, J. 2009). This in turn decreases the availability of fish for us to each, ironically limiting our food supply further. Historically, humans have mismanaged the resources available to them even when they were relatively abundant. An example of this is can be found in the American Southwest, where overgrazing caused the erosion of grassland to ever more worthless grasses, shrubs and weeds (Leopold, 1995). It is reasonable to expect that even with the technological capacity to overcome the limitations that a lack of microbes brings, that humans would resort to methods that are less than ideal to do so. In the process, we can extrapolate that further devastation of the Earth’s biosphere would result, such that most life on Earth would eventually perish. If humans, driven by the prospect of short term gains fail to adequately manage what little chance they have in the absence of microbes, they would, despite having the technological capacity to do so, fail in their own survival through this mechanism (Leopold, 1995). In fact, our current struggle to mitigate and manage the onset of climate change gives indication to the likelihood that given the circumstances of having no choice but to have to replace the nutrient cycling functions that microbes provide, even with the theoretical inevitability that technology would allow us to someday do so, we would fail, or in doing so, create an uninhabitable planet from which we must also protect ourselves.

This leads into my final contention, which addresses the psycho-social factors in addressing the need for change. An event as significant as the loss of all microbial life from Earth would require a level of international cooperation that has yet to materialize (Achenbach, J. 2012). Because of a lack of political willpower to do so, and humans often motivated more strongly by short term gains than by long term consequences, we are already unlikely to meet climate change goals set forth as guard rails to secure humanity’s safety (Raftery, Zimmer, Frierson, Startz & Liu, 2017). One could argue that eventual economic forces would incentivize humanity to make drastic shifts towards responsible change, but these reactionary movements fail in their adequacy to address the problem sufficiently, despite having the technology and the rationale for doing so. Therefore, humans present a unique challenge to their own survival because of their own psychological motivations in doing so. In our laziness, the responsibility of the tasks essential to our survival but providing no immediate benefit often get passed on to the government, in the assumption that it would be taken care of. In return, as we have seen through the example of climate change and in the field of conservation, that this results in a dynamic of minimization of the problem. Humanity thus intentionally ignores these sources of existential crisis to itself, failing to fully grasp the gravity of the situation, and even when it does, not acting in severe enough magnitude to stop it (Leopold, A. 1995). Our current economic and educational landscape are divergent from the requirement that mankind develop a strong land ethic as described by Leopold (1995) to ensure that what we currently have lasts many generations to come. Therefore, even with the technology, premise and resources to do so, and even with the luck to succeed without mishap, we as humans could fail because we lack the motivation to do so.

In conclusion, the inability of humankind to mitigate the effects that a loss of all microbial life brings boils down to three factors: the first being our inability to do so, the second being our rate of failure even in doing so, and third in our lack of motivation to do so. Because of these three factors, and their underlying mechanisms described above, it is thus unlikely if possible that humans could survive without microbes, while at the same time, based on maintaining the evolutionary equilibrium sustained for many millions of years, microbes would simply return the earth to its stable state without the presence of human activity. This problem closely mirrors the existential threat that we currently face: climate change and raises key questions as to how we should proceed. The solution to these issue is proposed not just in the form of greater volume of education, but in greater quality of education (Leopold, A. 1995). It highlights the problem as not just a technological struggle to control the world around us, but also a psychological need to do so. Perhaps it would serve us better to see ourselves as microbes have taken upon themselves to have done for billions of years, which is to now take on that responsibility of the custodians of this planet, even as we wrestle with our ever-growing power as the species that surpassed evolution.

Module 01 references

Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.

Achenbach, J. (2012). Spaceship Earth: A new view of environmentalism. Washington Post. Retrieved 16 February 2018, from https://www.washingtonpost.com/national/health- science/spaceship-earth-a-new-view-of- environmentalism/2011/12/29/gIQAZhH6WP_story.html?utm_term=.c18bd3664b50

Canfield, D., Glazer, A., & Falkowski, P. (2010). The Evolution and Future of Earth’s Nitrogen Cycle. Science, 330(6001), 192-196. http://dx.doi.org/10.1126/science.1186120

Falkowski, P., Fenchel, T. and Delong, E. (2008). The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science, 320(5879), pp.1034-1039.

Griffin, D. (2015). 5 years after the Gulf oil spill: What we do (and don’t) know. CNN. Retrieved 16 February 2018, from https://www.cnn.com/2015/04/14/us/gulf-oil-spill- unknowns/index.html

Kasting, J. and Siefert J. (2002). Life and the Evolution of Earth’s Atmosphere. Science, 296(5570), pp.1066-1068.

Leopold, A. (1995). A Sand County almanac, and sketches here and there. Norwalk, Conn.: Easton Press.

Martinez, A., Bradley, A., Waldbauer, J., Summons, R. and DeLong, E. (2007). Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proceedings of the National Academy of Sciences, 104(13), pp.5590-5595.

McAlpine, C., Seabrook, L., Ryan, J., Feeney, B., Ripple, W., Ehrlich, A., & Ehrlich, P. (2015). Transformational change: creating a safe operating space for humanity. Ecology And Society, 20(1). http://dx.doi.org/10.5751/es-07181-200156

Nisbet, E. and Sleep, N. (2001). The habitat and nature of early life. Nature, 409(6823), pp.1083-1091.

Raftery, A., Zimmer, A., Frierson, D., Startz, R., & Liu, P. (2017). Less than 2???°C warming by 2100 unlikely. Nature Climate Change, 7(9), 637-641. http://dx.doi.org/10.1038/nclimate3352 RockstrÖm, J. (2009). A safe operating space for humanity. Nature Reviews, 461(1), 472-475.

Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863

Zehnder, A.J.B. and Stumm, W. (1988). Geochemistry and biogeochemistry of anaerobic habitats. Biology of anaerobic microorganisms. Wageningen University.pp.1-38.

World Nuclear Association. (2018). World-nuclear.org. Retrieved 16 February 2018, from http://www.world-nuclear.org/information-library/safety-and-security/safety-of- plants/fukushima-accident.aspx

Module 02

Module 02 Portfolio Content

  • Evidence worksheet_04
    • Completion status:done
    • Comments:
  • Problem Set_03
    • Completion status:done
    • Comments:
  • Writing assessment_02
    • CANCELED
  • Additional Readings
    • Completion status: none
    • Comments

Evidence worksheet 04

Ottessen et al, 2014

Learning objectives:

. Discuss the relationship between microbial community structure and metabolic diversity . Evaluate common methods for studying the diversity of microbial communities . Recognize basic design elements in metagenomic workflows

General Questions:

. What were the main questions being asked?

Are there diel cycles present in heterotrophs? How about in bacteria? What about archaea?

How can we better understand temporal transcriptional dynamics in bacterioplankton communities?

  1. For oceanic bacterioplankton, what are the temporal transcription dynamics with communities in terms of oligonucleotides?

  2. Do the expression patterns correlate with other species present and contribute to coordinated biogeochemical activities?

. What were the primary methodological approaches used? RNA sequencing of microbial communities using illumina mi-seq

Multi-day time series of bacterioplankton from the N. Pacific subtropical region.

Automated Lagaman sampling every 2 years X 3 days. Microbial RNA converted to cDNA and sequenced to assess while genome transcriptome dynamics of predominant planktonic microbial populations. Using robotic envrionmental sampler to measure temperature, salinity, chlorophyll, transmission.

. Summarize the main results or findings.

Dominant conas: prochlorococcus and other proteorhodopsin containing or protoheterotrophic bacteria.

Some microdiversity (phylogenetic analysis of gene transcripts) Transcript activity of prochlorococcus depends on the time of day. No obvious trends of fuid sharing peak exp during mid-day Roserbacter: showed strong diel oscillations in transcriptome profile (notably in exp genes involved in bacterochlorophyll associated aerobic anoxygene p/s) Proteorhodopsin cont. heterotrophs –> evidence of diel in many gene transcripts. Temporal transcript variation observed.

. Do new questions arise from the results?

How many species are involved in one reaction to occur?

Do activities coexist with each other? (Phototrophs and heterotrophs) Is there regulation across all oceans and body of waters?

. Do new questions arise from the results?

How many species are involved in one reaction to occur?

Do activities coexist with each other? (Phototrophs and heterotrophs) Is there regulation across all oceans and body of waters?

. Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

Kyoto encyclopedia of genes and genomes: Resource reference not yet encountered. Possibly an older source reference. PCA analysis: not yet encountered. Lab vs wild transcripts are a strong point of analysis. Weak points of analysis: sampling bias due to small sample sizes? Inconsistent with transcript expression? Controls? Water mixing and salinity/temperature effects?

Problem set 03

Madigan et al., n.d.

Wooley, Godzik and Friedberg, 2010

Yarza et al., 2014

Youssef et al., 2015

Questions

How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?

Carl Woesse proposed an “original” 12 bacterial phyla. 16s rRNA sequencing added multiple deep phyla to this model. It is believed that the actual number of phyla far exceeds this number acquired through 16s rRNA sequencing. This is because many of the phyla that exist in rare biospheres are not able to be cultured. Currently 89 bacterial phyla are recognised (2016). The proportion of phyla that exist in these rare or ‘shadow’ biospheres supposedly outnumber the current known phyla to a large degree. 79 different phyla are proposed from new primers being designed for 16s rRNA sequencing (Youssef et al., 2015).

Total number of bacterial phyla with cultured representatives number about 30. It is expected that the total number of phyla exceed 1000. (Madigan et al., n.d. , Yarza et al., 2014) As for archaea, there are about 20 phyla that have currently been identified via 16s rRNA sequencing.

How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?

Shotgun metagenomics (IMG/m, MG-RAGT, NLBI/EBI) Assembly - EULER Binning - S-GCOM Annotation - KEGG Analysis pipelines - Megan 5 Marker gene metagenomics Standalone software - OTUbase Analysis pipelines- SILVA* Denoising - Amplicon Noise Databases - Ribosomal Database Project (RDP)

What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications) ?

Many: thousands and increasing: e.g. 110217 on EBi database Types of environments: All (sediments, soil, gut, aquatic..) Esp. those where it is hard to culture communities in lab settings.

What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?

Phylogenetics: Vertical gene transfer, carry phylogenetic information allowing tree reconstruction, taxonomic, ideally single-copy Functional: more horizontal gene transfer, identify specific biogeochem functions associated with measurable effects, not as useful for phylogeny

What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?

Binning is the process of grouping sequences or reads that comes from a single genome Types of algorithms: 1. Align sequences to database 2. Group to each other based on DNA characteristics: GC content, codon usage Risks and opportunities in binning Risks: incomplete coverage of genome sequence (not representative of genome) contamination from different phylogeny (similar properties in other phyla can get sorted into the bin) These are not closed genomes. We want at max 5 -10% differences in genomes for something to be considered different.

Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?

Functional screens (biochemical) 3rd gen sequencing (nanopore) (an in-between for single cell and shotgun type sequencing) (unproven) Single cell sequencing (Flow cytometry) FISH probe (looking for highly conserved sequences and counting the cells)

Module 02 references

Madigan, M., Bender, K., Buckley, D., Sattley, W. and Stahl, D. (n.d.). Brock biology of microorganisms.

Ottesen, E., Young, C., Gifford, S., Eppley, J., Marin, R., Schuster, S., Scholin, C. and DeLong, E. (2014). Multispecies diel transcriptional oscillations in open ocean heterotrophic bacterial assemblages. Science, 345(6193), pp.207-212.

Wooley, J., Godzik, A. and Friedberg, I. (2010). A Primer on Metagenomics. PLoS Computational Biology, 6(2), p.e1000667.

Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F., Ludwig, W., Schleifer, K., Whitman, W., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014). Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology, 12(9), pp.635-645.

Youssef, N., Couger, M., McCully, A., Criado, A. and Elshahed, M. (2015). Assessing the global phylum level diversity within the bacterial domain: A review. Journal of Advanced Research, 6(3), pp.269-282.

Module 03

Module 03 Portfolio Content

  • Evidence worksheet_05
    • Completion status: done
    • Comments:
  • Problem set_04
    • Completion status: done
    • Comments:
  • Writing Assessment_03
    • Completion status: done
    • Comments:
  • Additional Readings
    • Completion status: none
    • Comments

Evidence worksheet 05

Welch et al., 2002

Part 1: Learning objectives:

. Evaluate the concept of microbial species based on environmental surveys and cultivation studies. . Explain the relationship between microdiversity, genomic diversity and metabolic potential . Comment on the forces mediating divergence and cohesion in natural microbial communities

General Questions:

. What were the main questions being asked?

The main question being posed is how different is the genome of strains from a single species of bacteria? How do the environments of each strain change the genomic makeup of the species of bacteria? What defines a species? Which conserved genes would be both unique to E.coli and contribute towards distinguishing traits of the species?

. What were the primary methodological approaches used?

Cloning, sequencing, sequence analysis and annotation. Whole genome sequences were generated and sequencing was done by dye-terminator. Gene annotation was done through MAGPIE, and GLIMMER was used to define ORFs. Sequence alignment was done using BLAST, and matches were defined at 90% similarity.

. Summarize the main results or findings.

above 70% of ORFs reported as unique to each strain were replaced with ORFs unique to each strain of uropathogenic bacteria. ORFs also had usage that differed significantly between each strain (52 of 61). However, ORF usage was not dtectably different between CFT073 and EDL933. Two thirds of island genes shared have unknown functions associated with phage or insertion sequence elements. Disease producing ability is reflected in the abscence of the type III secretion system, and phage plasmids. Overall, the evironmental niche of existence defined many of the ORF island differences in each strain of E.coli. Even within pathogenic strains, the locality of infection caused significant variation in genomes between each strain. This raises the possibility of defining new strains according to ecological niches, and indicates the significance of the role of horizontal gene transfer in creating genetic diversity even in the species of E.coli.

. Do new questions arise from the results?

What truly defines a strain? How does binning vs broadening our scope for definition of a strain affect our classification of organisms into various taxa? Will the classifications of various taxa change over time as new methods of analysis emerge?

. Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

Methods used in the paper originated from the early 2000’s. Sequence analysis software is no longer the standard for sequence annotation. However, it was interesting to note that BLAST for protein analysis is still the most common method for protein sequence alignment today. Authors published the paper from a perspective of genomic analysis. The significance of this paper applies to ecological genomics and others involved in the field can easily comprehend the terminology used. However, the significance of the article needs to be articulated towars the broader general population in order to be directly understood.

Part 2: Learning objectives:

. Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution

Gene loss, duplication and acquisition are acted upon by the foundational forces of natural selection. These forces determine which genes end up in which ecological niches, and determine the genetic makeup of strains in specific environments.

. Identify common molecular signatures used to infer genomic identity and cohesion

Inference of similar ORFs were given by a 90% match in sequence. The paper used ORFs encoding proteins in each species as a basis of comparison. In other studies the 16s rRNA is the gold standard for assaying genetic diversity in microbes.

. Differentiate between mobile elements and different modes of gene transfer

Mobile elements and gene transfer mean that the genotype of an organism can change drastically as a microbe to survive in an otherwise inhospitable environment. Because these genes come from completely different species and even different phyla of microbes, this can make classification of species difficult.

Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.

Even within the same SSU group, each genotype of each cell is going to differ. As such, each genome is like a snowflake: no two genomes are exactly alike.

Depending on the localization of each strain within the human body, the ORF islands that are being expressed or are present are specific for functions that fulfill a strains requirements for survival within that specific area.

Genes can be transferred from other cells of different species and even phyla that have adapted to living in such an environment. Further, mutations could also occur over evolutionary time allowing a strain that is otherwise not able to survive in an environment to now do so. Viral infection could also contribute to the insertion of sequences allowing for these adaptive traits. Finally, these same mutations and gene transfers can cause strains to shift from commensual within a body to being pathogenic. These are often called virulence genes that lie within pathogenicity islands.

Problem Set 04

Kunin et al. 2010 Lundin et al. 2012

Learning objectives:

  • Gain experience estimating diversity within a hypothetical microbial community

Outline:

In class Day 1:

  1. Define and describe species within your group’s “microbial” community.
  2. Count and record individuals within your defined species groups.
  3. Remix all species together to reform the original community.
  4. Each person in your group takes a random sample of the community (i.e. devide up the candy).

Assignment:

  1. Individually, complete a collection curve for your sample.
library(kableExtra)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.7.2     v stringr 1.2.0
## v readr   1.1.1     v forcats 0.2.0
## -- Conflicts ---------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(knitr)

example_data1 = data.frame(
    number = c(1,2,3),
    name = c("lion", "tiger", "bear"),
    characteristics = c("brown cat", "striped cat", "not a cat"),
    occurences = c(2, 4, 1)
  )  

bin1 = data.frame(
  Number = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15),
  Name = c("Rigoa","Skittles","MandMs","MikeandIkes","Gummybears","Lego","Gumdrops","fruitgummies","macrophage","cokebottles","Gummywhitedrops","Watermelon","RedGreenFish","Kisses","Redsnakes"),
  Characteristics = c("long gummies","sour candy with shell","Chocolates with shell","Long chewy beans","Bear shaped gummies","Brick shaped hard candy","large round chewy candy with hard shell","fruit shaped gummies","octopus shaped gummies coated sugar","coke bottle shaped gummies","striped disk gummies coated suger","watermelon coloured and sphere shaped gummies","red and green fish shaped gummies","teardrop shaped chocolates","long thin red snake gummies"),
  Occurences = c(7,197,218,199,91,18,24,2,6,3,3,1,1,16,13)
 )  
  
bin1 %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
Number Name Characteristics Occurences
1 Rigoa long gummies 7
2 Skittles sour candy with shell 197
3 MandMs Chocolates with shell 218
4 MikeandIkes Long chewy beans 199
5 Gummybears Bear shaped gummies 91
6 Lego Brick shaped hard candy 18
7 Gumdrops large round chewy candy with hard shell 24
8 fruitgummies fruit shaped gummies 2
9 macrophage octopus shaped gummies coated sugar 6
10 cokebottles coke bottle shaped gummies 3
11 Gummywhitedrops striped disk gummies coated suger 3
12 Watermelon watermelon coloured and sphere shaped gummies 1
13 RedGreenFish red and green fish shaped gummies 1
14 Kisses teardrop shaped chocolates 16
15 Redsnakes long thin red snake gummies 13
#The "organisms" found in this table takes into account all candy given to us and leaves none out. Rare or unclassifiable species were given their own bin with descriptions on how they differed. 


#Part 02
bin2 = data.frame(
  x = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141),
  y = c(1,2,2,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8)
)

ggplot(bin2, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

#The curve produced is observed to have flattened out after 56 cells have been sampled.
#The cap on number of species observed is limited by the sampling size: only about half of all species have been observed from this small sample
  1. Calculate alpha-diversity based on your original total community and your individual sample.
#Part 03

Spec1 = 7/799
Spec2 = 197/799
Spec3 = 218/799
Spec4 = 199/799
Spec5 = 91/799
Spec6 = 18/799
Spec7 = 24/799
Spec8 = 2/799
Spec9 = 6/799
Spec10 = 3/799
Spec11 = 3/799
Spec12 = 1/799
Spec13 = 1/799
Spec14 = 16/799
Spec15 = 13/799

1/(Spec1^2 + Spec2^2 + Spec3^2 + Spec4^2 + Spec5^2 + Spec6^2 + Spec7^2 + Spec8^2 + Spec9^2 + Spec10^2 + Spec11^2 + Spec12^2 + Spec13^2 + Spec14^2 + Spec15^2)
## [1] 4.706271
#The simpson reciprocal index for my total original community is 4.706271.

s1 = 31/141
s2 = 6/141
s3 = 29/141
s4 = 46/141
s5 = 17/141
s6 = 5/141
s7 = 3/141
s8 = 4/141

1/(s1^2+ s2^2 + s3^2 + s4^2 +s5^2 + s6^2 + s7^2 + s8^2)
## [1] 4.631027
#The simpson reciprocal index for my own sample is 4.631027.

8 
## [1] 8
# The chao1 estimate for my sample is 8

15+2^2/26
## [1] 15.15385
#The chao1 estimate for the total original community is 15.15385.

#Part04
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
bin1_diversity = 
  bin1 %>% 
  select(Name, Occurences) %>% 
  spread(Name, Occurences)
bin3 = data.frame(
  Number = c(1,2,3,4,5,6,7,8),
  Name = c("Skittles","Bigballs","M&Ms","Jellybeans","Gummybears","Bricks","Kisses","Redsnakes"),
  Characteristics = c("sour candy with shell","large round chewy candy with hard shell","Chocolates with shell","Long chewy beans","Bear shaped gummies","Brick shaped hard candy","teardrop shaped chocolates","long thin red snake gummies"),
  Occurences = c(31,6,29,46,17,5,3,4)
)  

bin3_diversity = 
  bin3 %>% 
  select(Name, Occurences) %>% 
  spread(Name, Occurences)

diversity(bin1_diversity, index="invsimpson")
## [1] 4.706271
#Simpson Reciprocal Index for the total organism pool is 2.333333.
specpool(bin1_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All      15   15       0    15        0    15   15       0 1
# Species chao chao.se jack1 jack1.se jack2 boot boot.se n
#All      15   15       0    15        0    15   15       0 1
diversity(bin3_diversity, index="invsimpson")
## [1] 4.631027
#Simpson Reciprocal Index for my sample is 4.631027.
specpool(bin3_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All       8    8       0     8        0     8    8       0 1
#    Species chao chao.se jack1 jack1.se jack2 boot boot.se n
#All       8    8       0     8        0     8    8       0 1

# The values match the previous calculations

In class Day 2:

  1. Compare diversity between groups.

Part 1: Description and enumeration

Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.

Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.

Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.

For example, load in the packages you will use.

#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)

Then load in the data. You should use a similar format to record your community data.

example_data1 = data.frame(
  number = c(1,2,3),
  name = c("lion", "tiger", "bear"),
  characteristics = c("brown cat", "striped cat", "not a cat"),
  occurences = c(2, 4, 1)
)

Finally, use these data to create a table.

example_data1 %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
number name characteristics occurences
1 lion brown cat 2
2 tiger striped cat 4
3 bear not a cat 1

For your community:

  • Construct a table listing each species, its distinguishing characteristics, the name you have given it, and the number of occurrences of the species in the collection.
  • Ask yourself if your collection of microbial cells from seawater represents the actual diversity of microorganisms inhabiting waters along the Line-P transect. Were the majority of different species sampled or were many missed?

Part 2: Collector’s curve

To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.

To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.

For example, we load in these data.

example_data2 = data.frame(
  x = c(1,2,3,4,5,6,7,8,9,10),
  y = c(1,2,3,4,4,5,5,5,6,6)
)

And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.

ggplot(example_data2, aes(x=x, y=y)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

For your sample:

  • Create a collector’s curve for your sample (not the entire original community).
  • Does the curve flatten out? If so, after how many individual cells have been collected?
  • What can you conclude from the shape of your collector’s curve as to your depth of sampling?

Part 3: Diversity estimates (alpha diversity)

Using the table from Part 1, calculate species diversity using the following indices or metrics.

Diversity: Simpson Reciprocal Index

\(\frac{1}{D}\) where \(D = \sum p_i^2\)

\(p_i\) = the fractional abundance of the \(i^{th}\) species

For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =

species1 = 2/(2+4+1)
species2 = 4/(2+4+1)
species3 = 1/(2+4+1)

1 / (species1^2 + species2^2 + species3^2)
## [1] 2.333333

The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.

  • What is the Simpson Reciprocal Index for your sample?
  • What is the Simpson Reciprocal Index for your original total community?
Richness: Chao1 richness estimator

Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.

\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)

\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more

So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =

3 + 1^2/(2*2)
## [1] 3.25
  • What is the chao1 estimate for your sample?
  • What is the chao1 estimate for your original total community?

Part 4: Alpha-diversity functions in R

We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.

library(vegan)

First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).

example_data1_diversity = 
  example_data1 %>% 
  select(name, occurences) %>% 
  spread(name, occurences)

example_data1_diversity
##   bear lion tiger
## 1    1    2     4

Then we can calculate the Simpson Reciprocal Index using the diversity function.

diversity(example_data1_diversity, index="invsimpson")
## [1] 2.333333

And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.

specpool(example_data1_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All       3    3       0     3        0     3    3       0 1

In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.

For your sample:

  • What are the Simpson Reciprocal Indices for your sample and community using the R function?
  • What are the chao1 estimates for your sample and community using the R function?
    • Verify that these values match your previous calculations.

Part 5: Concluding activity

  • How does the measure of diversity depend on the definition of species in your samples?

When we increase the strictness for what defines a species, our collector’s curve must now reflect a greater number of species and thus will flatten out later. Our diversity thus increases the more strict our definition of species becomes.

  • Can you think of alternative ways to cluster or bin your data that might change the observed number of species?

Sorting also by colour would increase diversity because of increased strictness.

  • How might different sequencing technologies influence observed diversity in a sample? Our definition of a species on a genetic level greatly affects the observed diversity in a sample.

Writing assignment 03

The global microbial biomass constitutes nearly one half of all biomass on Earth (Whitman et al., 1998). They number as the most common living organisms, and are responsible for driving the metabolic engines that enable all other life to exist. In addition, since the discovery of microbial entities as pathogens that cause disease, they have become key subjects towards the study of illness. They are the world’s oldest ancestors of life, having existed long before any human or animal has, and hold the key to unlocking the grand question of how we came to be. As such, great importance has been placed on the need to study and classify microbes to better understand the lines that tie our existence and history in with them. Yet, despite the importance of classifying the grand diversity of microbes, there remain significant challenges in doing so. In this essay, in fulfilling the broader aim to generate a synthesis of what we have learned in MICB 425, I will be detailing the challenges involved in defining a microbial species, in the context of our current reliance on 16S rRNA sequencing for classification of Operational Taxonomic Units (OTUs). Following this, I will be discussing the phenomenon of Horizontal Gene Transfer (HGT), specifically how it hinders our ability to trace the origin of species, and along with that, the origin of metabolic pathways. Finally, I will explain why it is necessary to have a clear definition, even in the context of a shifting definition with the advent of amplicon sequence variants (ASVs) about to replace OTUs as the standard unit of a species.

The advent of genome sequencing technology has enabled humanity the ability to study microbes in ways that have previously been impossible through culture-based identification methods alone. By-passing the cultivation limitation in detecting microbial life has allowed us to estimate that we had only been able to capture about 1% of all microbial diversity up to that point (Chen et al., 2013). 16S rRNA sequencing seeks to compare the sequences of RNA found within the 16S portion of the small ribosomal subunit. This sequence is used because all translationally active microbial life contains such a sequence. This has two primary benefits, the first being that sequences within this region are conserved for taxonomically similar organisms, allowing us to construct a phylogenetic tree that links all microbial life together through tracing the lineage of inheritance towards the common ancestor of all life on Earth. The second benefit is that only studying organisms that contain translationally active machinery allows us to narrow the scope of study to microbes that contribute towards the global metabolic engines that drive Earth’s biogeochemical cycles (Falkowski, Fenchel and Delong, 2008). From this, the concept of OTUs representing species came to be. The current standard for sequence similarity between OTUs is 97%. This currently defines what a species is in the status quo. To derive the OTUs present in each location, samples of the environment are gathered, and genetic material needs to be isolated from individual cells. PCR amplification is applied to extracted DNA, and sequences are processed through bioinformatics pipelines for compositional analysis of samples (Finotello, Mastrorilli and Di Camillo, 2016).

Challenges in defining a species within the microbial world thus stem from this definition of a species. The primary concern in defining a species through any measure is if that definition of a species accurately reflects what we would recognise as a species in other context. This is illustrated by the rift that we see when we compare the microbial ecology demarcation of a species against other taxonomic disciplines. If we were to apply the same standard of 97% sequence similarity to animal classification, all primates would be considered a single species (Lumen learning, 2018). As such, there has been suggestion for updating the current 97% identity threshold to define 16s rRNA OTUs (Edgar, 2017). This links towards the challenge in defining a species in the phenotypic differences within what we would recognise as a species. Could microbes of the same species display traits that are dissimilar enough that under different contexts would be considered a different species without much controversy? The answer appears to be an affirmative when we examine strains. There is an enormous amount of strain to strain variation, specifically when we look at pathogenic strains vs commensal ones. For example, when the genome of E. coli CFT073, a pathogen, EDL933 an enterohemorrhagic strain and MG1655 laboratory strain were compared, only 39.2% of their combined protein set, representing the functional elements within these lifeforms, were common between all three strains (Welch et al., 2002). This presents a case towards an update of our definition for an OTU to be aligned towards strains, with the justification being that this model would be more biologically informative (Edgar, 2017). However, there are also problems with this definition update. Some strains have similar phenotypes to each other, thus creating a situation where microbes that belong to the same OTU would be incorrectly assigned to a separate taxonomic unit. The next challenge lies in the use of 16S rRNA sequences for deriving our definitions of a species. It is clear, that our current definition of a microbial species lacks the biological relevance that is needed to exert translational change from our study of ecology, but on a broader level even the use of 16s rRNA has limitations. 16s rRNA makes up a small part of an organism’s entire genome, and apart from recognition of mRNA has no direct impact on the catalytic activity in a cell. It is therefore no surprise that a presentation of multi-omic sequence information lends a great deal more information towards the activity levels of individuals within a specific taxonomic unit than the use of 16S rRNA (Hawley et al., 2017). A similar challenge in the definition of a species is shown in differences that emerge in the methods employed to receive such sequences. To this end, the sequencing depth, sample preparation and storage, bioinformatics pipelines and even the sequencing platformed used to derive sequences impact how many species are observable in the world. This is evidenced by the fact that diversity estimates differ in varying degrees according to variations in any step of the study (Allali et al., 2017). Furthermore, diversity estimates employed for use in microbial ecology were originally developed for macroecology. Errors in sequencing techniques such as pyrosequencing, originally introduced as a means of increasing sequencing depth, lead to inflation of diversity estimates (Kunin et al., 2010). As such, we currently are largely ignorant of the biases and errors that are present in such estimates (Finotello, Mastrorilli and Di Camillo, 2016). What this illustrates is that our definition of a species is not yet fixed, and will depend on the methods used to study the 16s rRNA of microbes. Further complicating matters is the fact that pipelines fundamentally differ in the ways in which they classify OTUs. Some approaches make use of a reference sequence to cluster OTUs based on similarity to the reference, while others cluster OTUs according to sequence distance (Chen et al., 2013, Callahan, McMurdie and Holmes, 2017). When OTU clustering is not done referencing a database, there is no ability to cross-compare data between studies. These factors in combination make it difficult not only to classify species of microbes, but to also derive the ancestry and therefore trace the origins of metabolic activity in biogeochemical cycles.

How does HGT influence our understanding of this problem? HGT creates an evolutionary dynamic that is different from our typified understanding of phylogeny. At the core of this idea lies our construction of the microbial phylogenetic tree. The way this tree is drawn indicates that all microbial life originates from a singular common ancestor, and by that same logic, a single cell thought to be the first living form on Earth. HGT disrupts this by forcing us to consider the metabolic implications of such organisms solely bearing the burden to survive an extremely hostile environment which was early Earth. To do this, metabolic machinery to draw energy from a nutrient source must have existed in its entirety to enable a fully functional cell to exist from the onset of life. It is more likely that the earliest organisms evolved from what would tread the boundary of being alive and not, to quickly split the burden of metabolic processing to derive energy from an elemental source through the mechanism of HGT. As such, the earliest lifeforms need to be considered not as a single cell but a community of cells, promiscuously sharing information through HGT (Nisbet and Sleep, 2001). Therefore, even at the most logistically simple definition of a species: that of the universal common ancestor, HGT complicates our definition of the term “species”. Further down with increasing evolutionary complexity, HGT links parts of the tree that would otherwise occur far apart and clearly distinct from each other (Nisbet and Sleep, 2001). Between strains of what would otherwise be considered the same species, genomic islands residing within the common backbone are acquired via HGT (Welch et al., 2002). These occurrences provide indication towards the existence of genetic reservoirs, which are responsible for the storage and distribution of genes across species (Sogin et al., 2006). Together, these dynamics force us to imagine the true global phylogenetic distribution as more of a mangrove forest or a delta as opposed to a tree (Nisbet and Sleep, 2001), where mechanisms for both vertical and horizontal gene transfer exist, which further limit our ability to state a firm definition of a species.

Given the challenges that currently exist from the use of OTUs as the definitional markers of microbial species, and taking into consideration the increased complexity of this issue presented by HGT, we need to question the importance of maintaining a clear definition of species. To address this, we must first consider why we require a definition of microbial species, and how we apply this definition towards practical applications. Part of the reason behind why classification of microbes is important is the sheer number of them. It would be physically impossible to analyse each cell we encounter daily to derive their nature and what they are. Classification, especially relating to metabolic pathways that reside within cells, allows us to derive the nature of microbes residing within an ecological niche (BBC, 2018). From the generalized knowledge of an environmental niche, we can then expedite the translation of knowledge pertaining to specific microbes toward actions. This saves time, which can be illustrated to be important in two major fields, environmental engineering and medicine. For the former, to effectively mediate the effects of climate change on specific regions of agricultural importance, a sound understanding of the microbial composition within such an area will alert us to the fact that a disruption in the food chain from the primary production level is occurring (Torres-Beltrán et al., 2017, Hawley et al., 2017). In medicine, understanding the differences in presence between a pathogenic, enterohemorrhagic, or commensal strain can provide more insight as to mechanisms behind resistance to therapy (Welch et al., 2002). With the need to have a clear definition of microbial species established, we also find optimism in the fact that the development of new tools allows us to better form this definition to more accurately reflect reality. The creation of ASVs overcomes some of the limitations imposed by OTUs. De novo OTUs rely on emergent features of a data set defined for each, while closed reference OTUs rely on referencing a database to compare percentage similarity for classification of microbes. ASVs do not rely arbitrarily by difference in percentage to assign similarity to sequences. They instead infer biological sequences in samples prior to the introduction of amplification and sequencing error (Callahan, McMurdie and Holmes, 2017). As such, they carry the advantages present in biological relevance present in de-novo methods, whilst allowing for cross reference of studies. This will allow for future studies in microbial ecology to adopt a more standardized comparison, with improvements to biological relevance in the classification of rare species (Sogin et al., 2006).

In conclusion, the introduction of sequencing technology has allowed us to study the vast diversity of microbes in depth and accuracy never before possible. However, significant challenges persist in our current understanding of what defines a species. This, compounded by the fact that HGT plays a key role in the distribution of genes, means that it is unlikely that we will ever come to a perfect definition for what defines a species. However, the practical implications of refusing to define species altogether would be problematic. To this end, the introduction of new technologies that allow us to reach a clearer, more precise and more applicable understanding of what defines a species.

Module 03 references

Allali, I., Arnold, J., Roach, J., Cadenas, M., Butz, N., Hassan, H., Koci, M., Ballou, A., Mendoza, M., Ali, R. and Azcarate-Peril, M. (2017). A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiology, 17(1).

BBC.co.uk. (2018). BBC - KS3 Bitesize Science - Variation and classification : Revision, Page 7. [online] Available at: http://www.bbc.co.uk/bitesize/ks3/science/organisms_behaviour_health/variation_classification/revision/7/ [Accessed 23 Apr. 2018].

Callahan, B., McMurdie, P. and Holmes, S. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal, 11(12), pp.2639-2643.

Chen, W., Zhang, C., Cheng, Y., Zhang, S. and Zhao, H. (2013). A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs. PLoS ONE, 8(8), p.e70837.

Edgar, R. (2017). Updating the 97% identity threshold for 16S ribosomal RNA OTUs. Bioinformatics. Welch, R., Burland, V., Plunkett, G., Redford, P., Roesch, P., Rasko, D., Buckles, E., Liou, S., Boutin, A.,

Falkowski, P., Fenchel, T. and Delong, E. (2008). The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science, 320(5879), pp.1034-1039.

Finotello, F., Mastrorilli, E. and Di Camillo, B. (2016). Measuring the diversity of the human microbiota with targeted next-generation sequencing. Briefings in Bioinformatics, p.bbw119. Courses.lumenlearning.com. (2018). Classification of Microorganisms | Boundless Microbiology. [online] Available at: https://courses.lumenlearning.com/boundless-microbiology/chapter/classification-of-microorganisms/ [Accessed 22 Apr. 2018].

Hackett, J., Stroud, D., Mayhew, G., Rose, D., Zhou, S., Schwartz, D., Perna, N., Mobley, H., Donnenberg, M. and Blattner, F. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences, 99(26), pp.17020-17024.

Hawley, A., Torres-Beltrán, M., Zaikova, E., Walsh, D., Mueller, A., Scofield, M., Kheirandish, S., Payne, C., Pakhomova, L., Bhatia, M., Shevchuk, O., Gies, E., Fairley, D., Malfatti, S., Norbeck, A., Brewer, H., Pasa-Tolic, L., del Rio, T., Suttle, C., Tringe, S. and Hallam, S. (2017). A compendium of multi-omic sequence information from the Saanich Inlet water column. Scientific Data, 4, p.170160.

Kunin, V., Engelbrektson, A., Ochman, H. and Hugenholtz, P. (2010). Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology, 12(1), pp.118-123.

Lundin, D., Severin, I., Logue, J., Östman, Ö., Andersson, A. and Lindström, E. (2012). Which sequencing depth is sufficient to describe patterns in bacterial ??- and ??-diversity?. Environmental Microbiology Reports, 4(3), pp.367-372.

Nisbet, E. and Sleep, N. (2001). The habitat and nature of early life. Nature, 409(6823), pp.1083-1091. L. Sogin, M., G. Morrison, H., A. Huber, J., Mark Welch, D., M. Huse, S., R. Neal, P., M. Arrieta, J. and J. Herndl, G. (2006). Microbial diversity in the deep sea and the underexplored ‘’rare biosphere’’. PNAS, 103(32), pp.12115-12120.

Torres-Beltrán, M., Hawley, A., Capelle, D., Zaikova, E., Walsh, D., Mueller, A., Scofield, M., Payne, C., Pakhomova, L., Kheirandish, S., Finke, J., Bhatia, M., Shevchuk, O., Gies, E., Fairley, D., Michiels, C., Suttle, C., Whitney, F., Crowe, S., Tortell, P. and Hallam, S. (2017). A compendium of geochemical information from the Saanich Inlet water column. Scientific Data, 4, p.170159.

Welch, R., Burland, V., Plunkett, G., Redford, P., Roesch, P., Rasko, D., Buckles, E., Liou, S., Boutin, A., Hackett, J., Stroud, D., Mayhew, G., Rose, D., Zhou, S., Schwartz, D., Perna, N., Mobley, H., Donnenberg, M. and Blattner, F. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings of the National Academy of Sciences, 99(26), pp.17020-17024.

Whitman, W., Coleman, D. and Wiebe, W. (1998). Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences, 95(12), pp.6578-6583.

Project 1

Please refer to the PDF file included in the MICB425_portfolio folder uploaded onto Github

Project 2

Please refer to the PDF file included in the MICB425_portfolio folder uploaded onto Github